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Independent component analysis 

of microarray data in the study of endometrial cancer. 
Saidi Samir A; Holland Cathrine M; Kreil David P; MacKay 
David J C; Charnock- Jones D Stephen; Print Crist in G; Smith 
Stephen K 

Department of Obstetrics and Gynaecology, University of 
Cambridge, Cambridge CB2 2SW, UK., samsaidi@obgyn.cam.ac.uk 
Oncogene, (2004 Aug 26) 23 (39) 6677-83. 
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Entered Medline: 20040916 
Gene microarray technology is highly effective in screening for 
differential gene expression and has hence become a 

popular tool in the molecular investigation of cancer. When applied to 
tumours, molecular characteristics may be correlated with clinical 
features such as response to chemotherapy. Exploitation of the huge 
amount of data generated by microarrays is difficult, however, and 
constitutes a major challenge in the advancement of this methodology. 
Independent component analysis (ICA) , a modern 

statistical method, allows us to better understand data in such complex 
and noisy measurement environments. The technique has the potential to 
significantly increase the quality of the resulting data and improve the 
biological validity of subsequent analysis. We performed microarray 
experiments on 31 postmenopausal endometrial biopsies, comprising 11 
benign and 2 0 malignant samples. We compared ICA to the established 
methods of principal component analysis (PCA) , Cyber-T, and SAM. We show 
that ICA generated patterns that clearly characterized the malignant 
samples studied, in contrast to PCA. Moreover, ICA improved the 
biological validity of the genes identified as differentially expressed in 
endometrial carcinoma, compared to those found by Cyber-T and SAM. In 
particular, several genes involved in lipid metabolism that are 
differentially expressed in endometrial carcinoma were only found using 
this method. This report highlights the potential of ICA in the analysis 
of microarray data. 
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Blind source separation and the analysis of microarray 
data. 

Chiappetta P; Roubaud M C; Torresani B 

Laboratoire d' Analyse, Topologie et Probabilites , Centre de 
Mathematiques et Inf ormatique, Universite de Provence, 
France . 

Journal of computational biology : a journal of 
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1090-109. 
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Journal; Article; (JOURNAL ARTICLE) 
English 
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Entered STN: 20050125 
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approach for the exploratory analysis of gene 



expression data, based upon blind source separation techniques. 
This approach exploits higher-order statistics to identify a linear model 
for (logarithms of) expression profiles, described as linear combinations 
of "independent sources." As a result, it yields "elementary expression 
patterns" (the "sources"), which may be interpreted as potential 
regulation pathways. Further analysis of the so-obtained sources show 
that they are generally characterized by a small number of specific 
coexpressed or antiexpressed genes. In addition, the projections of the 
expression profiles onto the estimated sources often provides significant 
clustering of conditions. The algorithm relies on a large number of runs 
of "independent component analysis" with 

random initializations, followed by a search of "consensus sources." It 
then provides estimates for independent sources, together with an 
assessment of their robustness. The results obtained on two datasets 
(namely, breast cancer data and Bacillus subtilis sulfur metabolism data) 
show that some of the obtained gene families correspond to well known 
families of coregulated genes, which validates the proposed approach. 
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AB We recently published a review in this journal describing the design, 
hybridisation and basic data processing required to use gene arrays to 
investigate vascular biology (Evans et al . Angiogenesis 2003; 6: 93-104). 
Here, we build on this review by describing a set of powerful and robust 
methods for the analysis and interpretation of gene array data derived 
from primary vascular cell cultures. First, we describe the evaluation of 
transcriptome heterogeneity between primary cultures derived from 
different individuals, and estimation of the false discovery rate 
introduced by this heterogeneity and by experimental noise. Then, we 
discuss the appropriate use of Bayesian t-tests, clustering and 
independent component analysis to mine the 

data. We illustrate these principles by analysis of a previously 
unpublished set of gene array data in which human umbilical vein 
endothelial cells (HUVEC) cultured in either rich or low- serum media were 
exposed to vascular endothelial growth factor (VEGF) -A165 or placental 
growth factor (PlGF) -1 (131) . We have used Affymetrix U95A gene arrays to 
map the effects of these factors on the HUVEC transcriptome. These 
experiments followed a paired design and were biologically replicated 
three times. In addition, one experiment was repeated using serial 
analysis of gene expression (SAGE) . In contrast to 
some previous studies, we found that VEGF-A and PlGF consistently 
regulated only small, non- over lapping and culture media-dependant sets of 
HUVEC transcripts, despite causing significant cell biological changes. 
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2 004130013 MEDLINE 
PubMed ID: 15022635 

The operons, a criterion to compare the reliability of 
transcriptome analysis tools: ICA is more reliable than 
ANOVA, PLS and PCA. 

Carpentier Anne-Sophie; Riva Alessandra; Tisseur Pierre; 
Didier Gilles; Henaut Alain 

Laboratoire Genome et Inf ormatique, UMR 8116, Tour Evry2, 
523 Place des Terrasses, 91034, Evry, France.. 
carpentier@genepole . cnrs . f r 

Computational biology and chemistry, (2004 Feb) 28 (1) 
3-10. 

Journal code: 101157394. ISSN: 1476-9271. 
England: United Kingdom 
Journal; Article; (JOURNAL ARTICLE) 
English 

Priority Journals 
200404 

Entered STN: 20040317 
Last Updated on STN: 20040408 
Entered Medline: 20040407 
The number of statistical tools used to analyze transcriptome data is 
continuously increasing and no one, definitive method has so far emerged. 
There is a need for comparison and a number of different approaches has 
been taken to evaluate the effectiveness of the different statistical 
tools available for microarray analyses. In this paper, we describe a 
simple and efficient protocol to compare the reliability of different 
statistical tools available for microarray analyses. It exploits the fact 
that genes within an operon exhibit the same expression patterns. In 
order to compare the tools, the genes are ranked according to the most 
relevant criterion for each tool; for each tool we look at the number of 
different operons represented within the first twenty genes detected. We 
then look at the size of the interval within which we find the most 
significant genes belonging to each operon in question. This allows us to 
define and estimate the sensitivity and accuracy of each statistical tool. 
We have compared four statistical tools using Bacillus subtilis expression 
data: the analysis of variance (ANOVA) , the principal component analysis 
(PCA) , the independent component analysis 

(ICA) and the partial least square regression (PLS) . Our results show ICA 
to be the most sensitive and accurate of the tools tested. In this 
article, we have used the protocol to compare statistical tools applied to 
the analysis of differential gene expression. 

However, it can also be applied without modification to compare the 
statistical tools developed for other types of transcriptome analyses, 
like the study of gene co-expression. 
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Network component analysis: reconstruction of regulatory 
signals in biological systems. 

Liao James C; Boscolo Riccardo; Yang Young-Lyeol; Tran Linh 
My; Sabatti Chiara; Roychowdhury Vwani P 
Departments of Chemical Engineering, University of 
California, Los Angeles, CA 90095, USA., liaoj@ucla.edu 
Proceedings of the National Academy of Sciences of the 
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Entered Medline: 20040420 
AB High-dimensional data sets generated by high -throughput technologies, such 
as DNA microarray, are often the outputs of complex networked systems 
driven by hidden regulatory signals. Traditional statistical methods for 
computing low-dimensional or hidden representations of these data sets, 
such as principal component analysis and independent 
component analysis, ignore the underlying network 

structures and provide decompositions based purely on a priori statistical 
constraints on the computed component signals. The resulting 
decomposition thus provides a phenomenological model for the observed data 
and does not necessarily contain physically or biologically meaningful 
signals. Here, we develop a method, called network component analysis, 
for uncovering hidden regulatory signals from outputs of networked 
systems, when only a partial knowledge of the underlying network topology 
is available. The a priori network structure information is first tested 
for compliance with a set of identif iability criteria. For networks that 
satisfy the criteria, the signals from the regulatory nodes and their 
strengths of influence on each output node can be faithfully 
reconstructed. This method is first validated experimentally by using the 
absorbance spectra of a network of various hemoglobin species. The method 
is then applied to microarray data generated from yeast Saccharamyces 
cerevisiae and the activities of various transcription factors during cell 
cycle are reconstructed by using recently discovered connectivity 
information for the underlying transcriptional regulatory networks. 
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AUTHOR (S) : Kreil, David Philip [Reprint Author] ; MacKay, David J. C. 
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AB DNA microarrays allow the measurement of transcript abundances for 

thousands of genes in parallel. Most commonly, a particular sample of 
interest is studied next to a neutral control, examining relative changes 
(ratios) . Independent component analysis 

(ICA) is a promising modern method for the analysis of such experiments. 
The condition of ICA algorithms can, however, depend on the 
characteristics of the data examined, making algorithm properties such as 
robustness specific to the given application domain. To address the lack 
of studies examining the robustness of ICA applied to microarray 
measurements, we report on the stability of variational Bayesian ICA in 
this domain. Microarray data are usually preprocessed and transformed. 
Hence we first examined alternative transforms and data selections for the 
smallest modelling reconstruction errors. Log-ratio data are 
reconstructed better than non- transformed ratio data by our linear model 
with a Gaussian error term. To compare ICA results we must allow for ICA 
invariance under rescaling and permutation of the extracted signatures, 
which hold the loadings of the original variables (gene transcript ratios) 
on particular latent variables. We introduced a method to optimally match 
corresponding signatures between sets of results. The stability of 
signatures was then examined after (1) repetition of the same analysis run 



with different random number generator seeds, and (2) repetition of the 
analysis with partial data sets. The effects of both dropping a 
proportion of the gene transcript ratios and dropping measurements for 
several samples have been studied. In summary, signatures with a high 
relative data power were very likely to be retained, resulting in an 
overall stability of the analyses. Our analysis of 63 yeast wild-type vs. 
wild-type experiments, moreover, yielded 10 reliably identified 
signatures, demonstrating that the | variance observed is not just noise. 
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SOURCE: Department of Computer Science, Stanford University, 
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Genome biology, (2003) 4 (11) R76. 
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Journal; Article/ (JOURNAL ARTICLE) 
English 
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We apply linear and nonlinear independent component 
analysis (ICA) to project microarray data into statistically 
independent components that correspond to putative biological processes, 
and to cluster genes according to over- or under-expression in each 
component. We test the statistical significance of enrichment of gene 
annotations within clusters. ICA outperforms other leading methods, such 
as principal component analysis, k-means clustering and the Plaid model, 
in constructing functionally coherent clusters on microarray datasets from 
Saccharomyces cerevisiae, Caenorhabditis elegans and human. 
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A decomposition model to track gene 
expression signatures: preview on 

observer -independent classification of ovarian cancer. 
Martoglio Ann-Marie; Miskin James W; Smith Stephen K; 
MacKay David J C 

Department of Pathology, University of Cambridge, Tennis 
Court Road, Cambridge, CB2 1QP, UK., amm53@cam.ac.uk 
Bioinformatics (Oxford, England) , (2002 Dec) 18 (12) 
1617-24 . 

Journal code: 9808944. ISSN: 1367-4803. 
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Journal; Article; (JOURNAL ARTICLE) 
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English 

Priority Journals 
200307 

Entered STN: 20030124 
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MOTIVATION: A number of algorithms and analytical models have been 
employed to reduce the multidimensional complexity of DNA array data and 
attempt to extract some meaningful interpretation of the results. These 
include clustering, principal components analysis, self -organizing maps, 



SOURCE 



PUB. COUNTRY: 
DOCUMENT TYPE 



LANGUAGE : 
FILE SEGMENT: 
ENTRY MONTH: 
ENTRY DATE: 



AB 



and support vector machine analysis. Each method assumes an implicit 
model for the data, many of which separate genes into distinct clusters 
defined by similar expression profiles in the samples tested. A point of 
concern is that many genes may be involved in a number of distinct 
behaviours, and should therefore be modelled to fit into as many separate 
clusters as detected in the multidimensional gene 
expression space. The analysis of gene 

expression data .using a decomposition model that is independent of 
the observer involved would be highly beneficial to improve standard and 
reproducible classification of clinical and research samples. RESULTS: We 
present a variational independent component 

analysis (ICA) method for reducing high dimensional DNA array data 
to a smaller set of latent variables, each associated with a gene 
signature. We present the results of applying the method to data from an 
ovarian cancer study, revealing a number of tissue type-specific and 
tissue type -independent gene signatures present in varying amounts among 
the samples surveyed. The observer independent results of such molecular 
analysis of biological samples could help identify patients who would 
benefit from different treatment strategies. We further explore the 
application of the model to similar high -throughput studies. 
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AB MOTIVATION: The expression of genes is controlled by specific combinations 
of cellular variables. We applied Independent Component 
Analysis (ICA) to gene expression data, 

deriving a linear model based on hidden variables, which we term 
•expression modes'. The expression of each gene is a 
linear function of the expression modes, where, according to the ICA 
model, the linear influences of different modes show a minimal statistical 
dependence, and their distributions deviate sharply from the normal 
distribution. RESULTS: Studying cell cycle-related gene 
expression in yeast, we found that the dominant expression modes 
could be related to distinct biological functions, such as phases of the 
cell cycle or the mating response.! Analysis of human lymphocytes revealed 
modes that were related to characteristic differences between cell types. 
With both data sets, the linear influences of the dominant modes showed 
distributions with large tails, indicating the existence of specifically 
up- and downregulated target genes.* The expression modes and their 
influences can be used to visualize the samples and genes in 
low-dimensional spaces. A projection to expression modes helps to 
highlight particular biological functions, to reduce noise, and to 
compress the data in a biologically sensible way. 
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Correlation of gene expression profiles 
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Current Drug Discovery 
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United Kingdom 
Journal; Article 
016 Cancer 
022 Human Genetics 

027 Biophysics, Bioengineering and Medical 

Instrumentation 
029 Clinical Biochemistry 

English 
English 



Recent advances in expression profiling technologies now allow the study 
of gene expression and disease- related changes on a 

genome-wide scale. However, the quality of information obtained from 
expression profile data alone is limited. In contrast, correlation of 
expression profiles with a variety of clinical parameters should allow a 
more extensive analysis. The major challenge is the complexity of these 
datasets, which necessitates the use of sophisticated statistical 
algorithms, and the extraction of biological knowledge. 
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Blind gene classification - an application of a signal 
separation method 

Hori, Gen; Nishimura, Shin-ichi; Inoue, Masato; 
Nakahara , Hiroyuki 

Laboratory for Advanced Brain Signal Processing, Brain 
Science Institute, Wako-shi, Saitama, 351-0198, Japan 
Genome Informatics Series (2001), 12, 255-256 
CODEN: GINS E 9 ; ISSN: 0919-9454 
Universal Academy Press 
Journal 
English 

A new method based on independent component anal. (ICA) is shown to be a 
promising approach to automatic gene classification. Although ICA is 
similar to principal component anal. (PCA) , ICA has some advantage to PCA 
because it exploits higher order statistics and has no restriction to 
orthogonal transformations. The validity of the new method is illustrated 
by application to previously published yeast sporulation gene 
expression data. The data consists of expression data of 6118 
genes in yeast genome which were sampled at seven different times during 
sporulation. The classified groups by the ICA-based method have a good 
match with the classified groups based on manually obtained model 
profiles. It is notable that ICA-based method does not require a domain 
knowledge on genome and automatically classifies genes without any manual 
labor . 

REFERENCE COUNT: 4 THERE ARE 4 CITED REFERENCES AVAILABLE FOR THIS 
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us 


6552164 


Bl 


US PAT 


us 


6538119 


B2 


US PAT 


us 


6537759 


Bl 


US PAT 


us 


6518068 


Bl 


US PAT 


us 


6516276 


Bl 


US PAT 


us 


6512580 


Bl 


US PAT 


us 


6505125 


Bl 


US PAT 


us 


6465181 


B2 


US PAT 


us 


6448474 


Bl 


US PAT 


us 


6433019 


Bl 


US PAT 


us 


6403332 


Bl 


. US PAT 


us 


6399371 


Bl 


US PAT 


us 


6391543 


B2 


US PAT 


us 


6379929 


Bl 


US PAT 


us 


6379671 


Bl 


US PAT 


us 


6368792 


Bl 


US PAT 


us 


6350583 


Bl 


US PAT 


us 


6306273 


Bl 


US PAT 


us 


6300063 


Bl 


US PAT 


us 


6297018 


Bl 


US PAT 


us 


6265423 


Bl 


US PAT 


us 


6252047 


Bl 


US PAT 


us 


6232456 


Bl 


US PAT 


us 


6212824 


Bl 


US PAT 


us 


6210701 


Bl 


US PAT 


us 


6207642 


Bl 


US PAT 


us 


6207380 


Bl 


US PAT 


us 


6183952 


Bl 


US PAT 



Patent Numbers 2/7/05 



us 


6171787 


Bl 


US PAT 


us 


6165734 


A 


US PAT 


us 


6156576 


A 


US PAT 


us 


6130043 


A 


US PAT 


us 


6110675 


A 


US PAT 


us 


6103199 


A 


US PAT 


us 


6066459 


A 


US PAT 


us 


6055325 


A 


USPAT 


us 


6051559 


A 


US PAT 


us 


6015670 


A 


USPAT 


us 


6007996 


A 


USPAT 


us 


5994075 


A 


USPAT 


us 


5952180 


A 


USPAT 


us 


5939265 


A 


USPAT 


us 


5936731 


A 


USPAT 


us 


5929223 


A 


USPAT 


us 


5919638 


A 


USPAT 


us 


5817462 


A 


USPAT 


us 


5814454 


A 


USPAT 


us 


5804386 


A 


USPAT 


us 


5764819 


A 


USPAT 


JP 


2004355174 A 


DERWENT 


us 


20040180365 A 


DERWENT 


us 


20030207278 A 


DERWENT 


wo 


200123614 A 


DERWENT 
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